Order statistics for histogram data and a box plot visualization tool
نویسندگان
چکیده
Abstract. This paper deals with new descriptive statistics for histogram data, in the framework of symbolic data analysis. A main contribution consists in defining the main order statistics (median and quartiles) of a histogram variable using the quantile functions associated with the corresponding empirical distribution functions of the observed histograms. The definition of an order relationship between quantile functions is based on an appropriate probabilistic metric: the ` Wasserstein distance. Starting from the median and quartile functions definition, we extend the classic box-plot representation for set of quantile functions. Finally, we propose new measures of variability and skewness for a histogram variable associated with this representation. An application on real data allows us to corroborate the proposed measures and the new box-plot visualization tool.
منابع مشابه
Visual Summary Statistics UUCS-07-004
Traditionally, statistical summaries of categorical data often have been visualized using graphical plots of central moments (e.g., mean and standard deviation), or cumulants (e.g., median and quartiles) by box plots. In this work we reexamine the box plot and its relatives and develop a new hybrid summary plot that combines moment, cumulant, and density information. In view of the important ro...
متن کاملData Visualization of Outliers from a Health Research Perspective Using SAS/GRAPH and the Annotate Facility
SAS/GRAPH is a powerful tool for customizing the box plot to detect and identify outliers. This paper shows how to use the ANNOTATE facility and annotate data set to customize box plots and profile plots of outliers using data from a dietary-health study. This paper assumes: • A working knowledge of basic SAS/GRAPH procedures. • The ability to display or print graphics on your operating system...
متن کاملVisualization and Exploration of Time-varying and Diffusion Tensor Medical Image Data Sets
In this work, we propose and compare several methods for the visualization and exploration of time-varying volumetric medical images based on the temporal characteristics of the data. The principle idea is to consider a time-varying data set as a 3D volume where each voxel contains a time-activity curve (TAC). We define and appraise three different TAC similarity measures. Based on these measur...
متن کاملPractice of Epidemiology More Than Numbers: The Power of Graphs in Meta-Analysis
In meta-analysis, the assessment of graphs is widely used in an attempt to identify or rule out heterogeneity and publication bias. A variety of graphs are available for this purpose. To date, however, there has been no comparative evaluation of the performance of these graphs. With the objective of assessing the reproducibility and validity of graph ratings, the authors simulated 100 meta-anal...
متن کاملDesign and Implementation of a System for Interactive High-Dimensional Vector Field Visualization
Although the challenge of 2D flow visualization is deemed virtually solved as a result of the tremendous amount of effort invested into this problem, high-dimensional flow visualization, (e.g. the visualization of flow on surfaces in 3D (2.5D), the volumetric flow (3D), and flow with several attributes (nD) ), still poses many challenges and unsolved problems. In this paper we describe the desi...
متن کامل